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Abstract. The well-known Gilbert-Shannon- Reeds model for riffle shuffles assumes that 
the cards are initially cut 'about in half and then riffled together. We analyze a natural 
variant where the initial cut is biased. Extending results of Fulman (1998), we show a 
sharp cutoff in separation and L-infinity distances. This analysis is possible due to the 
O ■ close connection between shuffling and quasisymmetric functions along with some complex 

[ analysis of a generating function. 

(N 
t— i . 

1. Introduction 

We analyze a natural one-parameter model for riffle shuffling a deck of n cards. Roughly, 
the deck is cut into two piles with a binomial (n, 8) distribution. Then the piles are riffled 
■ together sequentially according to the following rule: if the left pile has A cards and the right 

pile has B cards, then drop the next card from the bottom of the left pile with probability 
A/(A + B). Continue until all cards are dropped. Starting at the identity, let Pe{w) be the 
probability of the permutation w after one such ^-shuffle. Define convolution by 

, (1.1) P*e k {w) = Y J Pe{v)Pl {k - 1) {v- 1 w), 

O. 

lO ■ and define the uniform distribution by U(w) = l/n\. 

When 8 = 1/2, this is the widely studied Gilbert-Shannon- Reeds model. The natural 
version with biased cuts was studied by [DFP92] , |Lal96t ILalOO] and most thoroughly by 
|Ful98j . A literature review is in Section [2] below. Here we study the rate of convergence in 
the separation and metrics: 

/ P* k (w) 

(1.2) SEP(£:) = max 1 { ' 



1 



U(w) 
P* k (w) 



X 

H ' (1-3) ^oo(fe) = max . 

w U{w) 

Note that SEP(fc) is bounded above by 1, and £oo(k) can be as large as n! — 1. Further, 
both SEP(fc) and £oo(k) are upper bounds for the total variation metric: 

\\P* k - U\\tv = \ E \ p * k H - U(w)\ < SEP(fc) < iooik). 
A main result of this note gives closed form expressions 

n 

(1.4) SEP(fc) = 1- sgn(w)Yl(8 i + (l-8) i ) kn ' {w) 

n 

(1.5) m*o = e n^+ci-^)^-!, 

we&n 1=1 
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where rii(w) is the number of i-cycles in the permutation w. Using these formulae we prove 
the following. 

Theorem 1. For the 9-biased riffle shuffle measure on & n , let 

2 log n — log 2 + c 



(1.6) k 



log(0 2 + (1 - 0) 2 ) 



Then 

(1.7) SEP(A;) ~ exp(e- c )-l 

(1.8) looik) ~ l-exp(-e- c ) 

for any fixed real c as n tends to oo. Here < < 1 is fixed. 

An upper bound on separation of this form is given in [Ful98] . Theorem Q] shows this 
bound is tight, holds also for , and establishes the cutoff phenomenon. Note that, as a 
function of 9, k as defined in (jl.6p above is smallest when 9 = 1/2, so unbiased cuts lead to 
fastest mixing. 

Background on Markov chains and shuffling is given in Section [2j There is an intimate 
connection between these biased shuffles and quasisymmetric functions explained in Sec- 
tion [3] where we prove (jl.4p and (|1.5p . The upper bound in [Ful98j is derived using a strong 
stationary time. This is shown to be exact and equivalent to (jl.4p in Section SJ The proof of 
Theorem [H which has extensions to allow 9 to depend on n (e.g. 9 = 1/n), is in Section [5j 



2. Riffle Shuffling 

A superb introduction to Markov chains which treats riffle shuffling and stationary times 
is the book by [LPW09] . The analysis of riffle shuffling has connections to algebra, geometry 
and combinatorics; a detailed survey is in }Dia03] . The results and references in [ADS11] 
and [CHIP] bring this up to date. 

For present purposes, the following extension is needed. Let 1 < a < oo, and let 9 = 

(01, 2 , • • • , 9 a ), with < 0j < 1 and X H h 9 a = 1, be fixed. A 0-shuffle of a deck of n 

cards proceeds as follows: Choose {Ni}^ =1 from the multinomial (n, 6) distribution, that is, 
with the distribution of n balls being dropped into a boxes independently according to 6. 
Cut the deck into a packets of sizes N\, N%, . . . ,N a (some of the packets may be empty). 
Now sequentially drop cards from the bottom of each packet, choosing to drop from pack % 
with probability proportional to its current packet size. Continue until all cards have been 
dropped into a single pile. Let Pq denote the associated measure on <S n . Note that several 
more detailed descriptions of Pq appear in [Ful98j. 

When a = 2 and 0i = 02 = 1/2, this is the basic Gilbert-Shannon-Reeds measure. 
When a = 2 and 0i = 0, 02 = 1 — 0, this is the 0-biased shuffle studied in the present 
paper. The measures Pq were studied by |DFP92j who prove that they convolve nicely: 
if = (0i, ...,9 a ) and 77 = (7/1, . . . ,%), then set 6 * rj = (9^, . . . ,0i%,0 2 r/i, • • -,9 a r] b ), a 
vector of length ah. 

Proposition 2 ( |DFP92j ). On © n; we have 

Pq* P v = Pq* v . 



Thus Pg k = Pg*k , and the combinatorics of Pq determines the convolution powers. [Ful98] 
works out many properties of these measures giving closed formulae and asymptotics for 
the distribution of cycle structure, inversions and descents. 
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When 9 = 1/2, a sharp analysis of the rate of convergence for the Gilbert-Shannon- 
Reeds measure in total variation distance appears in [BD92]. It is an open problem to give 
a similarly sharp analysis for the measures Pg k . 

Hyperplane walks. Our ^-shuffles may be studied from other points of view as well. 
They are a special case of hyperplane walks introduced in [BHR99J and further studied 
in [BD98J and more recently in |AD10] and [DPRllj . Further, they fall into the class of 
"Hopf-square" walks studied in [DPR11]. Each of these perspectives adds to our picture. 
A brief commentary follows. 

The braid arrangement is based on the Q) hyperplanes Hij = {x G W 1 \ Xi = Xj}, 
1 <■ i < J < n - This divides W 1 into chambers and faces. As shown in [BHR99], the 
chambers are indexed by permutations and the faces are indexed by block ordered set 
partitions. There is a simple projection operator which, given a chamber C and a face F, 
returns the chamber C * F that is adjacent to F and closest to C (in the sense of crossing 
the fewest number of hyperplanes). Details are in [BHR991 IBD98] . It is shown there that 
projection operates as a kind of inverse riffle shuffle. Put a probability measure on faces of 
form S, S c , with S C [n], giving probability 6>l s l(l - 9) n ~\ s \ to each (S may be empty). The 
resulting hyperplane walk may be explained as follows: Picture a deck of n cards in order. 
For each card, flip an independent #-coin. Remove all cards where the coin comes up heads, 
keeping their relative order fixed, and move them to the top of the deck. This is precisely 
an inverse 0-shuffle. 

The theory of [BHR991 1BD98] gives useful expressions for the eigenvalues of any hy- 
perplane walk. Specialized to 0-shuffles, they show there is one eigenvalue f3 w for each 
permutation w G & n . Further, [DPRllj gives a description of the left eigen vectors. These 
give right eigen vectors and values of the "forward" ^-shuffles. 

As one example, [BD98] gives a rate of convergence after fc-steps. In the present case, 
this reads 

(2-1) \\K k -U\\ TV < £ fa 

l<i<j<n 

with fyj = ^2pcHi w(F). By symmetry, /3jj = fli^ is constant in The sum is 

over all set partitions S,S C where either {1,2} C S or {1,2} C S c . So {1,2} contributes 
XMC[n-i] ^ 2 ^' A '(1 _ 9) n ~ 2 ~\ A \ = p 2 , the compliment contributes (1 - 9) 2 , and so fyj = 
9 2 + (1 — 9) 2 . The bound above becomes 

(2.2) \\K k - U\\ TV < Q (9 2 + (1 - 9f) k . 

This is exactly the birthday bound derived differently below. Of course, these are just upper 
bounds, and it is of interest to know if they can be improved. The theory developed below 
shows that 

(2.3) \\K k - U\\ TV < SEP{k) < r) {° 2 + (1 " Of f ■ 

for fixed 9 in (0, 1). Theorem □ shows that SEP(fc) ~ (™) (9 2 + (1 - 9) 2 ) k , so the bound is 
best possible. 

Recall that any w £ & n has a unique factorization as a product of decreasing Lyndon 
words: w = t\li ■ ■ ■£).. Here ti is Lyndon if it is lexicographically least among all cyclic 
rearrangements (so 132 is Lyndon but 213 is not). For example 236415 = 236 -4-15. The 
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theorem in [DPRllj shows 

k 

(2.4) p w = J] (eM + (i - e)^\) 

i=i 

where \£{\ is the length of the Lyndon word i^. If w is the reverse of the identity, then 
all \£i\ = 1 and /3 W = 1. The second eigenvalue is 9 2 + (1 — Q) 2 with multiplicity Q), 
so the bound f|2.3|> uses precisely these eigen values. More generally, the eigen values 
are Iir=i iP l + (1 — @y) ai f° r an y < aj < n with ^iaj = n, each with multiplicity 
nl/dli^OiO- 

3. QUASISYMMETRIC FUNCTIONS 

Background on symmetric function theory is in [Mac95 1 with [Sta99j developing the exten- 
sion to quasisymmetric functions. We work with infinitely many variables X = {xi}^_^. The 
space of quasisymmetric functions homogeneous of degree n has dimension 2 n . A basis 
for this space is indexed by subsets of [n— 1] = {1, 2, . . . , n— 1} or, equivalently, by compo- 
sitions of n. We use the following bijection between subsets D = {D\ < D2 < • • • < D a _i} 
of [n — 1] and compositions a = (01, 02, ■ ■ ■ , a a ) of n to identify subsets and compositions, 
which we denote by a D(a): 

(01,02 j • • • ,a B ) 1 — ► {01, oi + o 2 , • • • ,oi H ho a _i}, 

{D U D 2 -£»!,..., n-£>o-i) <— ' {^1 <^2 < ••• <A»-i}. 
The monomial quasisymmetric function basis is defined by 

(3.1) M a (x)= y <■'■■:■■■■>■>:■ 

il<l2<---<ia 

For example, M (W )(X) = £; 1<i2<i3 Sii^av 

The fundamental quasisymmetric function basis of |Ges84| is defined by 

(3.2) Qd(X)= Y x h--- x ^- 

i\<—<in 
'; ', • : >j</l) 

For example, for n = 4, Q{i}(X) = Ei 1 <i 2 <i 3 <i 4 ^1^2 ^3 ^4- Expressed in terms of 
monomial quasisymmetric functions, Qsn(X) = Mn%\{X) + M( 121 )(X) + M( l lj2 )(X) + 
-^(1,1,1,1) (X)- In general, the fundamental basis is related to the monomial basis by 

(3-3) Qd W (X)= Y M ^ X l 

a refines 

where a composition a of length a refines the composition /3 of length b if there exist indices 
= iq, i\, 12, ■ ■ ■ , % = cl such that Oi -^+i + ■ ■ • + o, . = fij. For example, both (1, 2, 1) and 
(1, 1,2) re fine (1,3) but (2,1,1 ) does not. 

[StaOlj . based on results in |Ful98] . established a sharp connection between 0-shuffling 
and quasisymmetric functions. 

Theorem 3 ( [StaOlj (Theorem 2.1)). Let w £ <5 n and G = (9 1 ,9 2 , . . . , 6 a ) be given. Then 

Pe{w) = QiDes(w){0), 

where iDes(w) = Des(t(j _1 ) is the inverse descent set of w. 

This identification together with (|3.3p gives a useful inequality which shows that separa- 
tion and too are achieved at the reversal and the identity permutations, respectively. 
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Proposition 4. For permutations w and u, if iDes(w) contains iDes(u), then Prob(w) < 
Prob(u) with equality if and only if iDes(w) = iDes(u). 

Proof. First note that a refines /3 if and only if D{a) contains D{(3). Let a and f3 be such 
that D{a) = iDes(ii;) and D(/3) = iDes(u). From (|3.3p and the transitivity of refinement, 
we have 

Qd(P){X) = E M ^ x ) 

7 refines /3 

= e M tPQ + E M 7'( X ) 

7 refines a 7/ refines j3 

7/ not refine o 

= Od(«)W+ E M rPO- 

7/ refines /3 
7/ not refine a 

Furthermore, a 7^ j3 if and only if /3 does not refine a, in which case the summand contains 
the term Mp. Since the X{ are probabilities, they are all nonnegative, thus making Qo( a ) 
strictly less than Qpmy 

In the partial order on subsets or, equivalently, composition, [n — 1} = D(l n ) is the unique 
minimal element and = D(n) is the unique maximal element. Therefore Proposition [J] 
has the following consequence. 

Corollary 5. For any 0, we have 

SEP(P e ) = I - n\ ■ Q^O) 
l^Pg) = max(l-n!-Q [n _ 1] (6>),n!-Q (0)-l). 



Remark 6. When 6 = (6, 1 — 9)* k , we show below that the maximum is taken on at the 
second argument. This is not always the case. On the cyclic group C3, with fi(l) = 1) = 
|,/u(0) = 0, we have 3^(1) - 1 = § and 1 - 3/x(0) = 1. 

For some permutations, the associated quasisymmetric functions are easy to write down. 
This happens in particular if the quasisymmetric function is symmetric. Below we need the 
elementary symmetric functions e n (X), the complete homogeneous symmetric functions 
h n (X), and the power sum symmetric functions p n (X). For A a partition with rii = nj(A) 
parts equal to i, define e\ = Y\ { e™\ h\ = Y\ { h™' , p\ = HiP^ • As ^ ranges over partitions 
of n, these are the familiar bases for the homogeneous symmetric functions of degree n. 

Note that 

(3.4) e n {X) = Q [n _ 1] (X), h n (X)=Q (X), p n (X* k ) = (p n (X)) k . 
Theorem 7. For any 6, with id = 1,2, ... ,n and rev = n, n — 1, . . . , 1, we /iawe 

(3.5) P^rev) 

(3.6) P * fc (id) 

where £(X) is the number of parts of X and z\ = Y\i i n *^Uj(A)!. 



n 

= Et-^vn^)^ 

Ahn i=l 
n 

= E^Ilft(«) h ' (A) , 

Ahn i=l 
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Proof. The result follows from Theorem [3l (|3.4p and the standard expansions [Mac95j 

(3.7) e n = ^e A ^ 1 pA and h n = ^z^px- 

x x 

□ 

Remark 8. For both ()3.5p and (|3.6p in the theorem, when A = (l n ), z^ 1 = 1/n! and 
]^[ i Pi(0)' cn ^ A - ) = 1. Thus the lead term is 1/n! and all other terms are strictly less than 1. 
As k tends to oo, these terms tend to and Pg k (id) ~ Pg fc (rev) ~ ^. Of course, our work 
is to quantify this convergence. 

Corollary 9. For any and all k > 0, we have 

SEP(P * fc ) = 1 - n!P * fc (rev), 
4o(P e * fc ) = n!P * fe (id) - 1. 

Proof. The first equality follows from the definition. For the second inequality, 

eoo(PS k ) = max(l - n!P * fc (rev),n!P * fc (id) - 1). 

In comparing terms, the 1 cancels in both, and the second term is a sum of positive terms 
while the first has the same terms, some with negative signs. □ 

Specializing to ^-biased shuffles, Corollary [9] and Theorem [7] imply (jl.4j) and (jl.5p . 

4. Strong Stationary Times 

Repeated shuffling from any of the measures in Section [2] forms a Markov chain id = 
Wo, W\, W%, ■ ■ • taking values in & n . A strong stationary time (SST) T is a stopping time 
(meaning P{T > k} only depends on Wo, W\, . . . , W&) such that for all k > and all 
w £ & n , 

(4.1) P{W k = w\ T <k} = U(w). 

We will build an SST for the Markov chain induced by Pg k . A basic proposition of this 
theory [LPW 09] [Lemma 6.1] is 

(4.2) SEP(A;) < P{T > k} for all k>0. 

Further, [AD87] shows that there always exists a fastest SST T* satisfying (14. 2D with equality 
for all k. 

Background on stationary times is in [DF90]. In this section, we build a fastest SST 
(following [AD87 and |Ful98_|) involving a birthday problem to bound the right hand side 
of (14. 2p . Solving this birthday problem by inclusion-exclusion gives a probabilistic proof 
of (II. 4p . Theorem [7] and even the expression for the elementary symmetric function e n in 
terms of the power sums (13.7p . 

Constructing an SST for Pg k . Consider the inverse process in which cards are labeled i 
with probability 9i independently. Then all the cards labeled 1 are removed, keeping them 
in their same relative order, followed by all cards labeled 2, and so on. This is one inverse 
#-shufhe. Repetitions may be realized by labeling each card with a vector with coordinates 
chosen independently from 0. The first shuffle is read off the first coordinate of each card, 
the second shuffle off the second coordinate, and so on. Conceptually, each card may be 
labeled with a vector of infinite length. 

Consider the first time T that the first T coordinates of the n cards are distinct. Repeated 
inverse shuffling sorts the vectors lexicographically, leaving the card with the smallest vector 
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on top, the next smallest second, and so on. By symmetry, at time T, the deck is uniformly 
distributed, even conditional on T = k. This is (|4.ip . Further, this T is fastest. To see 
this, note that the reversal permutation rev = n, n — 1, . . . , 1 is a halting state: P{T < 
k} < P{Wt = rev}. Indeed, if Wt = rev, then every pair of cards must have a distinct 
label. Existence of a halting state implies that T is fastest ([DF90 [Remark 2.39] and 
[LPW09J [Remark 6.12]), separation is achieved at rev, and 

(4.3) SEP(£;) = P{T > k} for all k>0. 

To work with the right hand side of ()4.3|) . let Aij be the event that the first k coordinates 
of the labels on the cards i and j are equal. Thus P{A,j} = {J2 a e D and 

(4.4) {T>k}= (J {Ail- 

l<i<j'<n 

Bounding the probability of the union by the sum of the probabilities yields 

(4.5) SEP(A;)<Q fcA . 

This bound is also derived in |Ful98j . The asymptotics of Section \5\ show it is quite accurate. 



Inclusion-Exclusion and the Birthday Problem. Consider this version of the birthday 
problem: n balls are dropped independently into B boxes with the chance of box i being rji. 
If Bij is the event that balls % and j both wind up in the same box, the chance of success 
(having two or more balls in the same box) is 

(4.6) P(success) = P |J B itj ] . 

\l<Kj<n J 

Elementary considerations show that the chance of failure (all balls in distinct boxes) is 
expressible using elementary symmetric functions e n as 1 — P (success) = n!e n (??i, . . . , tje)- 
Using the expression for e n in terms of the power sums (|3.7p gives 

(4.7) P(success) = 1 - S gn(w)p x{w) (rj) = 1 - n! £(-l)^ A )^V(*7) 

tu£6 n Ahn 

The inclusion-exclusion expansion of (j4.6j) gives a sum of polynomials which must match 
the neat expressions in (|4.7p . This may be seen explicitly using the inclusion-exclusion 
formula for the chromatic polynomial in [Sta95]. For example, 

P{B lt2 UB h3 UB 2 - 3 } = 3P(B h2 )-3P(B 1<2 nB 2)3 )+P(B 1>2 nB 1}3 nB 2}3 ) = 3(J>f )- 2 (J>l)' 
while (I4.7P gives 6(— ^P(2,i)(v) + iPziv)) matching (14. 6p . 

Remark 10. Since separation is achieved (uniquely) at the reversal permutation, (|4.3p . (|4.4p . 
(|4.6p , (|4.7p give a probabilistic proof of Theorem [71 

Remark 11. This connection between inclusion-exclusion, birthday problems and symmetric 
functions seems generally useful. See, for example, [MS04][pg. 604-605]. 
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This section derives the asymptotic results of Theorem [T] and some extensions. Without 
loss of generality, suppose 1/2 < 9 < 1. To bound the distance, using Corollary [9] 
together with (jl.5p and (jl.4p . we are interested in 



(5.i) i(k,n)= £ nc > 

where #j = J + (1 — 9) 3 and rij(w) denotes the number of j cycles in the permutation w. If 

fn(xi,...,X n ) = II 

then we have the identity 



-x. 



(5.2) ^ — / n (xi, . . . ,x n ) = exp ■ 

n=0 ' j=l ^ 

Therefore we have that 

(5.3) E5^ n ) = exp (ET 

Theorem 12. Define 



n! \ 1 — ' i 3 

n=0 \ j=l J 



M = M(k,n) = ^2n j 8j. 

3=2 

If M < y / n/(101og n), then we have 

pfg£«A (l + 0(i±^)). 

Proof. Set Ffc(z) = X^=i y^- By the residue theorem we have 

i(k,n) = — / exp(F fe (2:))2; n — = — — / exp(F k (ne tx ) - inx)dx. 

We divide the integral into the ranges when |x| < (logn) / \/n which gives the main contri- 
bution, and 7T > \x\ > (logn) / '^fn. 

Consider first the range \x\ < (logn)/ \fn. Here we have 

OO j CO j 

F k {ne ix ) = ne ix + ^ —O'u ,Jr = ne ix + ^ — 0} + 0(\x\M), 

3=2 3 j=2 3 

since e lJX = 1 + 0(j|x|). Therefore, using ne lx = n + inx — nx 2 /2 + 0(|x| 3 n) and Stirling's 
formula, the integral over this region is 

if i 00 i 



2im n 
which reduces to 



/ expfn yy — 6* + 0[\x\ z n + \x\M))dx 

J\x\<(logn)/*/E v 2 J^2 J 



oo „• 

(5.4) «p(£M(l + 0( 



1 + M 



3=2 17 V 
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Now consider the range n > \x\ > i\ogn) / y/n. Here we have 

Re(F k (ne lx )) < F k (n) - n(l - cos(x)) < F k {n) - c(logn) 2 , 

for some positive constant c. Using Stirling's formula again, the contribution of this segment 
of the integral is therefore 



■ oo „■ 

< — exp(F fc (n) - c(log nf) <C Vnexp ^ Jj —0] - c(log 

n j=2 3 



which may be absorbed into our error term. □ 

From this Theorem we can read off the behavior of the £°° distance after k biased shuffles. 
First consider the case when (1 — 6) logn is large. In this range put 

2 log n - log 2 + c 
(5 - 5j k ~ L-log(02 + (1-0)2). 

We find that the contribution to M(k, n) arises mainly from j = 2 and so M(k, n) e~ c , 
and we have 

(5.6) £(k,n) ~ exp(e" c ), 

so that the £°° distance behaves like exp(e _c ) — 1, and similarly the separation distance 
behaves like 1 — exp(— e _c ), in agreement with [DFP92] , 

Next consider the case when (1 — 9) logn = k £ [0, oo). Keep the notation above for k, 
here we find that n 2 Q\ = 2e~ c , as before, and for j > 3, 

(5.7) n^ J fc ~exp(i(- K + log2-c)). 

Therefore, if c > log 2 — k, then M(k, n) is small, and Theorem 1121 applies. Moreover in this 
case we have 

oo 

(5.8) £(k, n) ~ exp (e~ c + ^ - exp(~(-K + log 2 - c))) . 

i=3 3 

Finally, consider the extreme case 9 = 1 — 1/n. It is convenient here to define k = 
nlogn + cn. Then nWj ~ e _JC for j > 2, and M(k, n) is small provided c > 0. In that case 
we have 



(5.9) £(k, n) ~ exp ( ^ 



oo 

e 



3=2 3 

Compare with Theorem 1.1 of [DFP92] . 
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